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(57) Abstract 



A method for estimating rendering times for 
three-dimenstonar graphics objects and scenes is 
disclosed. The rendering times may be estimated in 
real-time, thus allowing a graphics system to alter 
rendering parameters (such as level of detail and number 
of samples per pixel) to maintain a predetermined 
minimum frame rate. Part of the estimation may be 
performed offline to reduce the time required to perform 
the final estimation. The method may also detect 
whether the objects being rendered are pixel fill limited 
or polygon overhead limited. This information may 
allow the graphics system to make more intelligent 
choices as to which rendering parameters should be 
changed to achieve the desired minimum frame rate. 
A software program configured to efficiently estimate 
rendering times is also disclosed. 
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WO 99/41704 PCT/US99/03227 
TITLE: ESTIMATING GRAPHICS SYSTEM PERFORMANCE FOR POLYGONS 



BACKGROUND OF THE INVENTION 



5 1. Field of the Invention 

This invention relates generally to the field of computer graphics and, more particularly, to 
estimating the polygon rendering performance of three-dimensional graphics systems. 



2. Description of the Related Art 

10 A computer system typically relies upon its graphics system fox producing visual output on the 

computer screen or display device. Early graphics systems were only responsible for taking what the 
processor produced as output and displaying it on the screen. In essence, they acted as simple translators or 
interfaces. Modern graphics systems, however, incorporate graphics processors with a great deal of 
processing power. They now act more like copro ce ssors rather than simple translators. This change is due to 

15 the recent increase in both the complexity and amount of data being sent to the display device. For e xamp le, 
modern computer displays have many more pixels, greater color depth, and higher refresh rates than earlier 
models/ Similarly, the images displayed are now more complex and may involve advanced rendering 
techniques such as anti-aliasing and texture mapping. 

As a result, without considerable processing power in the graphics system, the CPU would spend a 

20 great deal of time performing graphics calculations* This could rob the computer system of the processing 

power needed for p erformi ng other tasks associated with program execution and thereby dramatically reduce 
overall system performance. With a powerful graphics system, however, when the CPU is required to draw a 
box on the screen, the CPU is freed from having to compute the position and color of each pixel. Instead, the 
CPU may send a request to the video card stating "draw a box at these coordinates." The graphics system then 

25 draws the box, freeing the processor to perform other tasks. 

Generally, a graphics system in a computer (also referred to as a graphics accelerator) is a type of 
video adapter that contains its own processor to boost performance levels. These processors are specialized 
for computing graphical transformations, so they tend to achieve better results than the general-purpose CPU 
used by the computer system. In addition, they free up the computer's CPU to execute other commands while 

30 the graphics system is handling graphics computations. The popularity of graphical applications, and 
especially multimedia applications, has made high performance graphics systems a common feature of 
computer systems. Most computer manufacturers now bundle a high performance graphics system with their 
systems. 

Since graphics systems typically perform only a limited set of functions, they may be customized and 
35 therefore far more efficient at graphics operations than the computer's general purpose central processor. 

While early graphics systems were limited to performing two-dimensional (2D) graphics, their functionality 
has now grown to also include three-dimensional (3D) graphics rendering, including 3D graphics operations 
such as shading, fogging, alpha-blending, and specular highlighting. 

The processing power of 3D graphics systems has been improving at a breakneck pace. A few years 
40 ago, shaded images of simple objects could only be rendered at a few frames per second, while today's 
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systems support rendering of complex objects at 60Hz or higher. This higher performance allows modem 
graphics applications to increase the realism of the scene being displayed. 

One common method used to increase the realism of three-dimensional rendered objects is to 
increase the number of polygons used to display the object Most modern graphics systems render objects by 
5 first dividing or "tesse Hating' * the surface of the object into a number of polygons (i.e., closed plane figures 
bounded by straight lines). Bach polygon is then rendered individually. Rendering typically involves the 
following steps: (1) calculating a number of parameters for each vertex of the polygon, and (2) interpolating 
from the vertex parameters to rill in the polygon. Examples of vertex parameters may include color 
information, transiucency information, depth information, lighting information, and texture information. 

10 By increasing the number of polygons used to display an object, the object may appear smoother and 

may have a more realistic textured appearance. Figure 1A illustrates a sphere tessellated into a first number of 
polygons. Figure IB is an example of the same sphere tessellated into a much higher number of polygons. As 
shown by the figures, a more realistic scene may be rendered by using larger numbers of smaller polygons. 
Note since all polygons are typically broken into triangles for rendering, the terms "polygon" and "triangle" 

1 5 shall be used interchangeably herein. 

While this technique improves realism, it also increases the processing burden on a graphics system. 
Previous graphics applications used large polygons that contained a large number of pixels. Thus, the 
"overhead" of setting up each polygon consumed a relatively small portion of the graphics system's overall 
processing resources, while the process of interpolating the pixels within the polygon required the majority of 

20 the graphics system's processing power. These systems are referred to as "pixel fill limited" because the 
limiting performance factor is the number of pixels the graphics system is capable of calculating. Modern 
applications, however, are now using polygons that contain may contain only one or two pixels (or even less 
man one pixel, in some cases). Thus, the work of setting up polygons may require more time than the actual 
pixel calculation process. These systems are referred to as "polygon overhead limited" because the overhead 

25 associated with setting up polygons is the performance limiting factor. Note a particular graphics system may 
be polygon overhead limited for a particular scene (e.g., one with many small polygons) and pixel fill limited 
for a different scene (e.g., one with larger polygons or more complex pixel-level enhancements). 

Figure 2 is a graph Ulustratmg one possible per forma nce limit curve for a graphics system. As 
shown in the figure, once the polygon area falls below a particular size a«, the system's performance is limited 

30 by the polygon overhead. Similarly, once, the polygon size rises above ac, performance is limited by the 
maximum pixel fill rate. 

As previously noted, the processing power of graphics systems has increased rapidly in the past few 
years. However, even with these great increases, new applications continue to demand even greater 
performance. For example, some computer games and virtual reality programs require real time rendering of 

35 multiple, complex, three-dimensional objects at high frame rates. These graphics intensive applications place 
high demands upon graphics system performance and may easily exceed the graphics system's capabilities. 

One possibility is to lower the frame rate when the application exceeds the performance capabilities 
of the graphics system. However, mis is not always possible because some graphics applications have 
minimum frame rates below which the applications become unusable. For example, if the frame rate of a 3D 

40 computer game falls below a certain level, the movements and animation on the screen will become jerky. 

Furthermore, if the frame rate drops below a critical level, then the delay between when the user performs an 
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action (e.g., firing a missile) and a graphic representation of mat action appearing on the screen will be so long 
as to make the game unplayable. Thus, for many applications the frame rate may not be lowered below a 
certain predetermined level, even when the complexity of the scene being rendered increases dramatically. 

In contrast, however, if the system is polygon overhead limited, the number of polygons displayed 
5 can be adjusted without the limitations and drawbacks that lowering the frame rate has. For example, 

assuming a particular graphics system that can render a maximum of 1000 polygons per frame at a particular 
frame rate (due to polygon overhead), if a single moving sphere is displayed, then the sphere may be divided 
(Le., "tessellated") into no more than 1000 polygons without affecting the frame rate. If two similar moving 
spheres are displayed, the graphics system can either cut the frame rate in half (with the negative side effects 
1 0 discussed above), or it can display each sphere using only 500 polygons. Using the second alternative may 

result in the spheres appearing more jagged (Le., the sphere's surfaces will not appear as smooth), but without 
any added jerkiness to their movement. This technique may also be referred to as reducing the level- of-de tail 
(LOD) of the scene. 

If the system is pixel fill limited, another alternative to reducing the frame rate is to reduce the 
1 5 overall number of pixels being rendered. The may be accomplished by reducing the pixel area of a given 

objection, reducing the number of objects displayed, or by reducing the number of pixels in the overall scene. 

Ideally, the graphics system would be able to accurately determine the optimum size and number of 
polygons and the optimum number of pixels. This would allow the graphics system to optimize the displayed 
scene to contain the maximum number of polygons without lowering the frame rate and without wasting 
20 performance on polygon overhead. 

Thus, a graphics system capable of efficiently determining the performance limits for a particular 
scene is desired. Similarly, a method for efficiently determining graphics system performance limits is also 
desired. 

25 SUMMARY OF THE INVENTION 

The problems outlined above may in part be solved by a graphics system configured to estimate its 
rendering performance for a particular set of geometry data. In some embodiments, the graphics system may 
be configured to estimate scene rendering times on a frame-by-frame basis, and then adjust rendering 
parameters (e.g., the number of polygons, pixels, samples or features) to maintain a minfrrmm desirable frame 

30 rate. 

In one embodiment, the graphics system may estimate rendering performance by calculating an 
"effective polygon area" for each polygon in the geometry data. The effective polygon area is an estimate of 
rendering time for a polygon that takes into consideration the polygon's effect on the graphics system in light 
of the system's pixel fill and polygon overhead limitations. The graphics system may estimate the sum total 

35 of the effective area for all polygon to generate a total "effective area". This effective area may be calculated 
for both model space and screen space. As used herein "model space" refers to the coordinate system that the 
geometry data is specified in, while "screen space" refers to the coordinate system defined by the pixels on the 
display device. Similarly, a "geometry data set" refers to graphics data that is received and rendered into one 
or more frames by the graphics system. The graphics data may comprise vertices and or instructions (e.g., 

40 opcodes) that provide the graphics system with enough information to render (Le., draw) the data. The 
graphics data may represent a combination of both 2D and 3D objects to be rendered. 
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In some embodiments, the effective area may include all ox part of the back-facing sides of the 
polygons. These embodiments may generate more accurate estimates for graphics systems that utilize 
significant system processing resources to cull backing polygons. 

Furthermore, the effective area may include a "false area" value for polygons below a predeterxnined 
5 size. False area refers to an additional theoretical area that, if rendered as part of the polygon, would 
appr o xim ate the overhead processing time for polygons below a predetermined size. Another way to 
compensate for overhead processing times of small polygons is to simply round all polygons below a 
predetermined critical area ( referred to herein as up to the critical area a*.. The value represents the 
minimum size of a polygon below which the polygon set-up overhead becomes a significant limiting factor. 

10 For example, a large polygon (e.g., 40 pixels in area) may require only two clock cycles to set up and forty 
clock cycles to render. In this case the overhead associated with setting up the polygon is relatively small 
when compared with me time required to render the polygon. However, a small polygon (e.g., only a single 
pixel in area or even sub-pixel in area) may still require two clock cycles to set up but only one clock cycle to 
render. Thus, for smaller polygons the overhead of setting up the polygon may become a significant 

1 5 perfoirnance-lirnmng aspect. By rounding smaller polygons up to the predetermined critical area a«, the effect 
of set up overhead may be calculated for small polygons. Similarly, adding false area to the area of small 
polygons pe rfor ms the same function. Note the value a,, may vary across different graphics systems and may 
also vary according to the particular configuration of me graphics system (e.g., the color depth). 

The system may be configured to use pre-calculatcd values of a* to determine whether a particular set 

20 of geometry data will be polygon overhead bound or pixel fill bound. As noted above, the term polygon 

overhead bound refers to when the graphics system's performance is limited by per-polygon processing (e.g., 
lighting). In contrast, the term pixel fill bound refers to when the graphics system's performance is limited by 
per pixel calculations (e.g^ transparency, texturing, and anti-aliasing) or bandwidth. 

In one embodiment, the graphics system may be configured to perform these calculations in real-time 

25 or near-real-time. As used herein, a task is performed in 4 real time" if it is p erform ed without causing 
noticeable delay to the average user (e.g., on a per- frame or pet-display device refresh cycle basis). 
Conversely, as used in herein, a task is performed "offline" if it is not performed in real time (i.e., it causes 
noticeable delay to the user). 

In some embodiments, after determining that a particular set of graphics data will be polygon 

30 overhead bound or pixel fill bound when rendered and that the graphics system's frame rate will fall below a 
predetermined threshold, the graphics system may dynamically make modifications to the scene being 
rendered or the rendering parameters in order to raise the frame rate above the threshold. For example, if the 
set of graphics data is pixel fill bound, then the graphics system may be configured to reduce the number or 
density of samples or pixels in the scene. Alternatively, the graphics system may reduce the overall size of 

35 the object or image being rendered. 

In contrast, if the set of graphics data is polygon overhead bound and the system's frame rate falls 
below a predetermined threshold, the graphics system may reduce the level of detail (LOD) by either 
tesselating the scene or object using larger polygons, or the graphics system may select a pre-tessciatcd set of 
graphics data with fewer polygons. Reducing the number of objects displayed may further increase frame 

40 rates. 
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In some embodiments, the graphics system may calculate "cumulative probability distributions" to 
support the real-time calculation of performance limits for scenes or objects to be rendered The cumulative 
probability distributions represent the cumulative distribution of polygon sizes in the particular geometry data 
set The cumulative probability distribution may be calculated in two forms: (1) as the probability of a 
5 randomly selected polygon having an area of a* or less (referred to as f(aj), or (2) the probability that a 

randomly chosen point on the surface belongs to a polygon with an area of less than or equal to a« (referred to 
as g(a c )). The functions f(a«) and g(aj may also be calculated for screen space by applying a predetermined 
constant s that represents a scaling factor indicative of the scaling that takes place when converting model 
space polygons to screen space polygons. These screen space cumulative probability distribution functions 

10 are referred to herein as f (a^s) and gCa^s), respectively. 

To allow real-time or near-real-time estimation of geometry rendering times, a number of 
preliminary calculations may be performed ahead of time in an off-line fashion. While these calculations may 
als o be generated in a real-time or ne ar-re a I- time fashion, performing them off-line may further reduce the 
latency of the remaining real-time portion of the calculations. For example, several different values for a* 

1 5 (corresponding to different configurations of the graphics system) may be calculated offline. The function 
pm(x) may also be calculated off-line, wherein pm(x) is a model space probability distribution according to 
the following Dirac-deHa function: pm(a) = S(A — a) , wherein A is the area of a single polygon in model 
space. 

The system may also calculate f (a^s) and fe(a^s) off-line by numerical integration according to the 
20 following equations: 



a 

f (a) = Jpm(x)dx + J— ■ pm(x)dx, wherein f (a) = f ( acS ); and 



2 

25 g(a) = £ 2 • \ - pm(y)dxdy, wherein g(a) - fta,*). 



By calculating a*, f (a*s), and £(aeS) off-line, estimating the time required to render the graphics data 
may be efficiently accomplished in real-time by: (1) calculating the scaling factor s from the modeling and 
viewing matrices, and (2) evaluating the rendering rate according to the following formula: 

30 

effective^ area 

render_tnne = — : — : — — — 

pixel_ fLll_ rate 



(a c • n - f(a c s) + 1) + (— - s- total_model_space_area) -(1 - g(a B s)) 

pixel_fili_raie 
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wherein the term { — • s « total_ mod el__ area ) is used to approximates the total screen area. 
Note the total model space area may also be pre-computed to further reduce latency. 

In one embodiment, the render time estimate may be further enhanced by addressing the possibility 
5 of frustum clipping. Frustum clipping refers the process of culling out polygons that are not within the visible 
area of the display device (e.g., off-screen polygons). A value a may be computed in real time to represent an 
estimate of the fraction of polygons that are outside the current view frustum. Once calculated, this value a 
may be incorporated into the render time estimate as follows: 

irt _ (1 - a) * effective area + a a c 

10 render time = - - — : — — — f 

pixel__nll_rate 

While a may be calculated in a number of different ways, one simple estimation may be obtained by 
examining the object's bounding box and men detenmning what portion of the bounding box falls outside the 
displayable region. For a more accurate estimate, a plurality of smaller bounding boxes may be used for the 
1 5 object 1 s polygons. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Hie foregoing, as well as other objects, features, and advantages of mis invention may be more 
completely understood by reference to the following detailed description when read together with the 
20 accompanying drawings in which: 

Figure 1 A illustrates a sphere tessellated using a low number of polygons; 
Figure IB illustrates a sphere tessellated using a higher number of polygons; 
Figure 2 is a graph iUustmtmg one possible performance limit curve for a graphics system; 
25 Figure 3 is a diagram of an example computer system; 

Figure 4 is a simplified block diagram of the computer system of Figure 3; 

Figure 5 is a block diagram illustrating more details of one embodiment of the graphics system of 
Figure 4; 

Figure 6 is an image of five different three-dimensional rendered objects; 
30 Figures 7-1 1 are graphs of triangle parameterizations for each of the objects depicted in Figure 6. 

Figure 12 is a diagram illustrating the calculation of a triangle's aspect ratio and skew; 

Figures 13-14 are graphs that show histograms of aspect ratio in model space and screen space for 
two objects from Figure 6. 

Figures 15A-B illustrate the transformation from model space to screen space. 
35 Figure 16 is diagram illustrating calculations on the T Rex from Figure 6. 

Figures 17A-B are diagrams illustrating regions of the human eye. 

Figure 18 is a table illustrating various display devices* characterizations. 

Figure 19A is a flowchart iUustrating one embodiment of a method for estimating performance 
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Figure 19B is a diagram illustrating one embodiment of a graphics data set; 

Figure 20 is a diagram illustrating the proof of constancy of projection from model space to screen 

space. 

Figure 21 is a diagram of one embodiment of a computer network connecting multiple computers. 

5 

While the invention is susceptible to various modifications and alternative forms, specific 
embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It 
should be understood, however, that the drawings and detailed description thereto are not intended to limit the 
invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, 
1 0 equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the 
appended claims. 

DETAILED DESCRIPTION OF SEVERAL EMBODIMENTS 

1 5 Computer System — Figure 3 

Referring now to Figure 3, one embodiment of a computer system SO which includes a three- 
dimensional (3-D) graphics system is shown. The 3-D graphics system may be comprised in any of various 
systems, including a computer system, network PC, Internet appliance, a television, including HDTV systems 
and interactive television systems, personal digital assistants (PDAs), flight simulators, driving simulators, 

20 ship simulators, virtual reality environments, and outer devices which display 2D and or 3D graphics, among 



As shown, the computer system 80 comprises a system unit 82 and a video monitor or display device 
84 coupled to the system unit 82. The display device 84 may be any of various types of display monitors or 
devices (e.g., a CRT, LCD, or gas-plasma display). Various input devices may be connected to the computer 

25 system, including a keyboard 86 and/or a mouse 88, or outer input device (e.g., a trackball, digitizer, or 

tablet). Application software may be executed by the computer system 80 to display 3-D graphical objects on 
display device 84. As described further below, in one embodiment the 3-D graphics system in computer 
system 80 is configured to efficiently estimate polygon rendering performance and dynamically adjust 
rendering parameters to improve the frame rate, quality, and realism of images displayed on display device 

30 84. 

Computer System Block Diagram — Figure 4 

Referring now to Figure 4, a simplified block diagram illustrating the computer system of Figure 3 is 

shown. Elements of the computer system that are not necessary for an understanding of the present invention 
35 are not shown for convenience. As shown, the computer system 80 includes a central processing unit (CPU) 

102 coupled to a high-speed memory bus or system bus 104 also referred to as the host bus 104. A system 

memory 106 may also be coupled to high-speed bus 104. 

Host processor 102 may comprise one or more processors of varying types, e.g., zmcroprocessors, 

multi-processors and CPUs. The system memory 106 may comprise any combination of different types of 
40 memory subsystems, including random access memories, (e.g., static random access memories or ik SRAMs", 

synchronous dynamic random access memories or "SDRAMs", and Rambus dynamic access memories or 
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"RDRAM", among others) and mass storage devices. The system bus or host bus 104 may comprise one or 
more communication or host computer buses (for communication between host processors, CPUs, and 
memory subsystems) as well as specialized subsystem buses. 

A 3-D graphics system or graphics system 112 according to the present invention is coupled to the 
5 high-speed memory bus 104. The 3-D graphics system 112 may be coupled to the bus 104 by, for example, a 
cross bar switch or other bus connectivity logic. It is assumed mat various other peripheral devices, or other 
buses, may be connected to the high-speed memory bus 104. It is noted that the 3-D graphics system may be 
coupled to one or more of the buses in computer system 80 and/or may be coupled to various types of buses. 
In addition, the 3D graphics system may be coupled to a communication port and thereby directly receive 

10 graphics data from an external source, e.g., the Internet or a network. As shown in the figure, display device 
84 is connected to 3-D graphics system 112. 

Host CPU 102 may transfer information to and from the graphics system 1 12 according to a 
programmed input/output (I/O) protocol over host bus 104. Alternately, graphics system 1 12 may access the 
memory subsystem 106 according to a direct memory access (DMA) protocol or through intelligent bus 

1 5 mastering. In one embodiment, host CPU 1 02 may be configured to perform the calculations described above 
to: (1) determine whether the scene being rendered will cause the frame rate to fall below a predetermined 
mTrrimnm threshold, and then (2) vary the rendering parameters according to whether the scene is pixel fill 
limited or polygon overhead limited. 

A graphics application program conforming to an application programming interface (API) such as 

20 OpenGL may execute on host CPU 102 and generate commands and data mat define a geometric primitive 
(graphics data) such as a polygon for output on display device 84. As defined by the particular graphics 
interface used, these primitives may have separate color properties for the front and back surfaces. Host 
processor 102 may transfer these graphics data to memory subsystem 106. Thereafter, the host processor 102 
may operate to transfer the graphics data to the graphics system 112 over the host bus 104. In another 

25 embodiment, the graphics system 1 12 may read in geometry data arrays over the host bus 104 using DMA 
access cycles. In yet another embodiment, the graphics system 1 12 may be coupled to the system memory 
1 06 through a direct port, such as the Advanced Graphics Port (AGP) promulgated by Intel Corporation. 

The graphics system may receive graphics data from any of various sources, including the host CPU 
102 and/or the system memory 106, other memory, or from an external source such as a network, e.g., the 

30 Internet, or from a broadcast medium, e.g., television, or from other sources. 

As will be described below, graphics system 112 may be configured to allow more efficient 
microcode control, which results in increased performance for handling of incoming color values 
corresponding to the polygons generated by host processor 102. Note while graphics system 1 12 is depicted 
as part of computer system 80, graphics system 112 may also be configured as a stand-alone device (e.g., with 

35 its own built-in display). Graphics system 1 12 may also be configured as a single chip device or as part of a 
system-on-a-chip or a multi-chip module. 

Graphics System — Figure 5 

Referring now to Figure S, a block diagram illustrating details of one embodiment of graphics system 
40 112 is shown. As shown in me figure, graphics system 112 may comprise one or more graphics processors 90, 
one or more Buper-sampled sample buffers 162, and one or more sample- to-pixel calculation units 170A-D. 
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Graphics system 112 may also comprise one or more digital-to- analog converters (DACs) 178A-B. m one 
embodiment graphics processor 90 may comprise one or more rendering units 1 50A-D. In the embodiment 
shown, however, graphics processor 90 also comprises one or more control units 140, one or more data 
memories 152A-D, and one or more schedule units 154. Sample buffer 162 may comprises one or more 
5 sample memories 1 60A- 1 60N as shown in the figure. 

A. Control Unit 

Control unit 140 operates as the interface between graphics system 1 12 and computer system 80 by 
controlling the transfer of data between graphics system 1 12 and computer system 80. In embodiments of 

1 0 graphics system 112 that comprise two or more rendering units 150A-D, control unit 140 may also divide the 
stream of data received from computer system 80 into a corresponding number of parallel streams that are 
routed to the individual rendering units 1 50A-D. The graphics data may be received from computer system 
80 in a compressed form. This may advantageously reduce the bandwidth requirements between computer 
system 80 and graphics system 112. In one embodiment, control unit 140 may be configured to split and 

1 5 route the data stream to rendering units 150A-D in compressed form. In one embodiment, control unit 140 

may be configured to perform the calculations described above to determine whether the scene being rendered 
will cause the frame rate to fall below a predetermined minimum threshold, and then vary the rendering 
parameters according to whether the scene is pixel fill limited or polygon overhead limited. 

20 B. Rendering Units 

Rendering units 1 50A-D (also referred to herein as draw units) are configured to receive graphics instructions 
and data from control unit 140 and then perform a number of functions, depending upon the exact 
impleme n t at ion. For example, rendering units 150A-D may be configured to perform decompression (if the 
data is compressed), transformation, clipping, lighting, set-up, and screen space rendering of various graphics 

25 primitives occurring within the graphics data. Each of these features is described separately below. 

Depending upon the type of compressed graphics data received, rendering units 150A-D may be 
configured to perform arithmetic decoding* run-length decoding, Huffman decoding, and dictionary decoding 
(e.g., LZ77, LZSS, LZ78, and LZW). In another embodiment, rendering units 1S0A-D may be configured to 
decode graphics data that has been compressed using ge ometri c compression. Geometric compression of 3D 

3 0 graphics data may achieve significant reductions in data size while retaining most of the image quality. Two 
methods for compressing and decompressing 3D geometry are described in U.S. Patent No. 5,793,371, 
Application Serial No. 08/511,294, (filed on August 4, 1995, entitled "Method And Apparatus For 
Geometric Compression Of Three-Dimensional Graphics Data, 1 ' Attorney Docket No. 5181-05900) and 
U.S. Patent Application Serial No. 09/095,777, filed on June 11, 1998, entitled **Compression of Three- 

3 5 Dimensional Geometry Data Representing a Regularly Tiled Surface Portion of a Graphical Object," Attorney 
Docket No. 51 8 1-06602). In embodiments of graphics system 1 12; mat support decompression, the graphics 
data received by each rendering unit 150 is decompressed into one or more graphics << prirnitives >t which may 
then be rendered. The term primitive refers to components of objects that define its shape (e.g., points, lines, 
triangles, polygons in two or three dimensions, and polyhedra or free-form surfaces in three dimensions). 
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Transformation refers to manipulating an object and includes translating the object (Le,, moving the 
object to a different location), scaling the object (i.e., stretching or shrinking), rotating the object (e.g., in 
three-dimensional space, or "3-space**). 

Clipping refers to defining the limits of the displayed image (ie. f establishing a clipping region, 
5 usually a rectangle) and then not rendering or displaying pixels that tall outside those limits. 

Lighting refers to calculating the illumination of the objects within the displayed image to determine 
what color and or brightness each individual object will have. Depending upon the shading algorithm being 
used (e.g., cons tant, Gouraud, or Phong), lighting must be evaluated at a number of different locations. For 
example! if constant shading is used (i.e., each pixel of a polygon has the same lighting), then the lighting 
1 0 need only be calculated once per polygon. If Gouxaud shading is used, then the Hghting is calculated once per 
vertex. Phong shading calculates the lighting on a per-pixel basis. 

Set-up refers to mapping primitives to a three-dimensional viewport This involves translating and 
transforming each object from its original model space coordinates to a 1 "world-coordinate" system for all 
models and then to the established viewport's coordinates. This creates the correct perspective for three- 
1 5 dimensional objects displayed on the screen. 

Screen-space rendering refers to the calculations performed to actually calculate the data used to 
generate each pixel that will be displayed. In prior art systems, each pixel is calculated and then stored in a 
frame buffer. The contents of the frame buffer are then output to the display device to create the final image. 
In the embodiment of graphics system 1 12 shown in the figure, however, rendering units 150A-D calculate 
20 "samples** instead of actual pixel data. This allows rendering units 150A-D to "super-sample" or calculate 
more man one samp le per pixel. Super-sampling is described in greater detail below. Note mat rendering 
units 150A-B may comprises a number of smaller functional units, e.g., a separate sct~up/dc compress unit and 
a lighting unit. 

25 C. Data Memories 

Each rendering unit 1S0A-D may be coupled to an instruction and data memory 152A-D. In one embod iment , 
each data memory 152A-D may be configured to store both data and instructions for rendering units 150A-D. 
While implementations may vary, in one embodiment each data memory 1 52A-D may comprise 2 8MByte 
SDRAMs providing a total of 16 MBytes of storage for each rendering unit 150A-D. In another embodiment, 
30 RDRAMs (Rambus DRAMs) may be used to support the decompression and set-up operations of each 
rendering nnir, while SDRAMs may be used to support the draw functions of rendering units I50A-D. 

D, Schedule Unit 

Schedule unit 154 may be coupled between the rendering units 150A-D and the sample memories 160A-N. 
35 Schedule unit 154 is configured to sequence the completed samples and store them in sample memories 160A- 
N. Note in larger configurations, multiple schedule units 1 54 may be used in parallel. 

B. Sample Memories 

Sample memories 160A-160N comprise super-sampled sample buffer 162, which is configured to storing the 
40 plurality of samples. As used herein, the term "super-sampled sample buffer" refers to one or more memories 
that store samples. As previously noted, one or more samples are filtered to form output pixels (e.g., pixels to 
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be displayed on a display device), and the number of samples stored may be greater than, equal to, or less than 
the total number of pixels output to the display device to refresh a single frame. Each samples may 
correspond to one or more output pixels. As used herein, a sample corresponds to an output pixel when the 
sample's information contributes to the final output value of the pixel 
5 Stated another way, the super-sampled sample buffer comprises a sample buffer which stores a 

plurality of samples. The samples have positions that correspond to locations on the display, Le., the samples 
contribute to one or more output pixels at a respective location on the display. These locations may 
correspond to the center of pixels on the display device, or they may correspond to positions that are between 
pixel centers on the display device. The number of stored samples may be greater than the number of pixel 

1 0 locations, and more than one samples may be combined in the convolution (filtering) process to generate one 
or more pixels displayed on the display device. 

Sample memories 160A-160N may comprise any of a number of different types of memories (e.g., 
SDRAMs, SRAMs, RDRAMs, 3DRAMs) in varying sizes. Note while the embodiment described herein 
utilizes a super-sampled sample buffer, other embodiments may use a traditional pixel frame buffer. 

1 5 However, when using a super-sampled sample buffer, a set of graphics data determined to be pixel-fill limited 
may cause the graphics system (or corresponding software) to reduce the sample density for part of all of the 
scene being rendered to improve the frame rate. 

Graphics processor 90 may be configured to generate a phrrahry of sample positions according to a 
particular sample positioning scheme (e.g., a regular grid, a perturbed regular grid, etc.). Alternatively, the 

20 sample positions may be read from a memory (e.g., a ROM table). Upon receiving a polygon that is to be 
rendered, graphics processor 90 determines which samples fall within the polygon. Graphics processor 90 
renders the samples and then stores them in sample memories 1 60A-N. Note as used herein the terms render 
and draw are used interchangeable and refer to calculating color values for samples. Alpha values and other 
per-s ample values may also be calculated in the rendering or drawing process. In one embodiment, graphics 

25 processor 90 may be configured to perform the calculations described above to determine whether the scene 
being rendered will cause the frame rate to fall below a pre determined mmimum threshold, and then vary the 
rendering parameters according to whether the scene is pixel fill limited or polygon overhead limited. 

E. Sample-to-pixel Calculation Units 

3 0 Sample-to-pixel calculation units 1 70A-D may be coupled between sample memories 1 60A-N and DACs 
178A-B. Sample-to-pixel calculation units 170A-D are configured to read selected samples from samp le 
memories 160A-N and then perform a convolution (e.g., a filtering and weighting function) on the samples to 
generate the output pixel values which are output to DACs 178A-B. The sample-to-pixel calculation units 
170A-D may be programmable to allow them to perform different filter functions at different times, 

35 dep ending upon the type of output desired. In one embodiment, the sample-to-pixel calculation units 170A-D 
may irnplement a 4x4 super-sample reconstruction band-pass filter to convert the super-sampled sample 
buffer data (stored in sample memories 160A-N) to single pixel values. In another embodiments, calculation 
units 170A-D may average a selected number of samples to calculate an output pixel. The averaged samples 
may be mult iplied by a variable weighting factor that gives more or less weight to samples having positions 

40 close the center of the pixel being calculated Other filtering functions may also be used either alone or in 
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combination, e.g., tent filters, circular and elliptical filters, Mitchell filters, band pass filters, sync function 
filters, etc. 

Sample- to- pixel calculation units 170A-D may also be configured to with one or more of the 
following features: programmable video timing generators, programmable pixel clock synthesizers, crossbar 
5 functions, and color-look-up tables. Once the sample-to-pixel calculation units have manipulated the timing 
and color of each output pixel, the out pixels are conveyed to DACs 178A-B. 

F. DACs 

DACs 17SA-B operate as the final output stage of graphics system 1 12. The DACs 178A-B serve to translate 
10 the digital pixel data received from cross units 1 74A-B into analog video signals that are then sent to the 

display device. Note in one embodiment DACs 178A-B may be bypassed or omitted completely in order to 
output digital pixel data in lieu of analog video signals. This may be useful when display device 84 is based 
on a digital technology (e.g., an LCD-type display or a digital micro-mirror display). 

15 Definitions 

The following functional notations will used herein; 
s *■ model space to screen space scaling factor, 

f[x) - probability of a randomly chosen polygon from a collection of polygons having an area of x or less; 



g(x) « probability that a randomly chosen point on the surface formed by a collection of polygons belongs to a 
polygon having an area of x or less; 



f (x) « f(x) for a unity scaling factor s (i.e., s-1); 

& 00 ■ g(x) for a unity scaling factor s (i.e., s=l); 

30 pm(a) - the model space probability distribution of a single polygon having an area A; and 

a >* an estimate of die fraction of polygons that are outside the view frustum. 

Parameterization of a Polygon - Figures 6-11 
35 Conceptually, there are three general classes of tessellated objects: (1) objects that have been 

pre- tessellated to meet certain surface curvature and detail of interest criterion; (2) objects that are 
dynamically tessellated to meet a size criteria in screen space; and (3) objects that are statically tessellated to 
meet certain size criteria in model space. 

The first class may include most traditional triangulated objects, whether hand-digitized, 3D scanned 
40 and simplified, or tessellated from a parametric representation. The second class may include parametric 

objects dynamically tessellated by various shaders to produce rrricropofygons. Simple shaders include texture 
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mapping, bump mapping, and displacement mapping. The parametric representation can be as simple as a 
polygon with texture coordinates, or as complex as high order NURBS (Non-Uniform Rational B-Splinc). 
The third class is from so-called geometric shaders, and generally result from pro-applying shaders that are 
too complex to evaluate in real-time (e.g., procedural textures). Each of these different classes may produce 
somewhat different screen space statistics of polygons, but the analysis tools developed in the next few 
sections apply to all classes. 

A common representation of objects to be rendered in 3D computer graphics is as a collection of 
model space polygons. Such an object may be referred herein as a geometry. Note while triangles may be 
used in some examples herein for simplicity, the disclosure and claims may be applied more generally to all 
polygons. During the rendering process, the individual polygons are transformed to a common model space 
and men projected to screen space. The final rendering process then draws pixels into the frame burlier (ox 
samples into the sample buffer) for eventual display on the display device. First, the properties of screen 
space polygons are discussed. The results are then generalize to the original model space polygons. For 
simplicity, in this section all polygons are assumed to be visible, and not subject to frustum, face, or occlusion 
clipping or culling (these are described in greater detail below). 

To determine the area statistics of a collection of screen space polygons, one approach may be to 
compute a histogram of the frequency of occurrence of screen space polygons of a given area. Normalizing 
this histogram by the total count of polygons results in a probability distribution function p(a), which 
represent the probability of a random screen space polygon having the screen area a. 

However, linear plots of these probability distributions are visually iminf onnatx v c , as they tend to 
look like extreme exponential curves smashed up against the small end of the area plot In order to make 
interesting details visible, the probability distribution may be plotted using something like a logarithmic axis 
for area. Unfortunately, the use of a log axiB destroys one of the nice visual properties of probability 
distributions, i.e., the area under the curve no longer indicates the relative population of a given area of 
polygon. Probability distributions have another limitation when using empirical data from real objects 
because q uantizat ion effects can leave artifacts in the curve, thereby necessitating artificial smoothing. To 
avoid these issues, a cumulative probability distribution may be used. Thus the function f(a) may be defined 
as the probability of a randomly selected polygon having an area of a or less. Given p(a), f(a) is just the 
definite integral of p{a) between 0 and a: 

f(a) = j)>(x)dx (l) 

It may also be useful to have a function for the cumulative area of polygons, i.e., what amount of the 
total surface area of the object or scene being rendered is accounted for by polygons of area less man or equal 
to a. Another way to think of this is the probability that a randomly chosen point on the surface belongs to a 
polygon with area less man or equal to a. This cumulative area probability is referred to herein as g(a). Given 
p(a), g(a) is: 
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B(a) = 



jxp(x)dx 
£.p(x)dx 



total area 



'(xp(x)dx 



(2) 



Hie lower term is just the total (screen space) area of the geometry. Using these definitions, fQ and 
gO may be computed for a collection of geometries and then plotted. 



The objects are a V22 Osprey aircraft 250, a triceratops 252, an engine 254, a Buddha 256, and a 
Tyrannosaunis Rex {T Rex) skeleton 258. 

Osprey 250 is a traditional tessellated objects from Viewpoint Datalabs having approximately 30,000 
thousand triangles. Triceratops 252 is a textured object having approximate ty 250,000 triangles. It was 

10 produced by applying a shader that imp-mapped an image texture onto a Viewpoint Datalabs three- 
dimensional model having approximately 6,000 triangles. Engine 254 is an MCAD model having 
approximately 250,000 quarter million triangles. Buddha 256 is a Cyberware-scanned object having 
approximately 300,000 triangles. T Rex 258 is a Viewpoint Datalabs three-dimensional model having 
approximately 130,000 triangles. 

1 5 Using the definitions for f() and g() described above, graphs of fQ and g0 for each object are shown 

in Figures 7-11. In the graphs, f(a) is depicted as a sold black line and g(a) is depicted as a long-spaced 
dashed line. The function e(a), which is depicted as small-spaced dashed line, and h(a), which is depicted as a 
medium-spaced dashed line, are functions that will be defined further below. For reference, the triangle 
counts, screen space area, and depth complexity are displayed above each graph in Figures 7-11. Each object 

20 was drawn individually and scaled to fit within a 960 x 680 window. 

A variety of observations can be made from the curves in the graphs of Figures 7-1 1 . First, note that 
f _1 (0.5) is the median triangle area. Second, for Osprey 250, engine 254, and to some extent T Rex 258, the 
f(a) curve is shifted substantially to the left of the g(a) curve. This reflects that the majority of the triangles 
are relatively small in area and that a great deal of the area is locked up in a relatively small number of large 

25 triangles. The variance in triangle area can also be seen in the graphs. The range of the eightieth percentile 
cut includes triangles having sizes that vary between one and two orders of magnitude 

Empirically Understanding fQ and gQ 

Osprey 250 is the simplest of the five objects, having less than an eighth as many triangles as the 
30 objects, excluding T Rex 258. Nevertheless, examining f ( (0^) indicates that the median triangle is less than 
two pixels in area. From g" l (CU), it can be seen that 90% of the area is locked up in triangles greater than 10 
pixels in size. These statistics turn out to be caused by fine geometric details in the wheels and landing gear. 
This sort of investigative analysis is one use of the cumulative curves,. i.e, analyzing objects to see if and 
where opportunity for triangle count reduction exist While engine 254 has eight times the triangle count of 
3 5 Osprey 250, engine 254 has otherwise similar f() and g0 curves. A reason for mis can be seen by comparing 
the screen area of the two objects (screen area equaling the total number of pixels rendered, not unique pixels 
touched). Engine 254 has five times the screen area of the Osprey, and thus, in terms of normalized screen 
area, engine 254 has only 8/5ths more triangles per pixel rendered. Given this, it is not surprising mat the f() 
and gO statistics would be similar. In the other three objects, it is clear that 90% of the triangles are less than 
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Turning now to Figure 6, an image of five different three-dimensional rendered objects is shown. 
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three and a half pixels in area. Thus it appears that these objects were not tessellated with large triangles in 
n^md. These large numbers of small triangles place an order of magnitude more demand on real-time 
hardware tenderers when compared with tesselations that use fewer and larger triangles. Note the shapes of 
the curves are dependent only on the object being rendered. Changes in scale will only result in the shifting of 
5 the curves to the right or left on the graph. 

Triangle Aspect Ratio and Skew Statistics 

While the primary determiner of rendering p erformanc e is triangle area, in some cases other factors 
contribute as welL A complete characterizing of screen space triangles thus includes not just triangle area, but 
1 0 also triangle aspect ratio and skew. Turning now to Figure 12, a diagram illustrating the calculation of a 

triangle's aspect ratio and skew is shown. As used herein, aspect ratio is defined as a number between 0 and 

V3/2 representing the ratio of the triangle height to the triangle's longest edge. The largest possible aspect 

ratio is a/3/2 (for an equilateral triangle). Note a right isosceles triangle will have an aspect ratio of 1/2. 
Aspect ratio is important when a triangle is many pixels in width, but only a pixel or two in height. This is 
15 because most rendering hardware is relatively inefficient in filling such triangles. This is caused by various 
nnp l e niBntati nn — sp ecific bottlenecks, including: page misses, bank fragmentation, and valid pixel search 
overhead. Skew is defined as how close the third vertex is to the closest vertex of the longest edge of the 
triangle, and varies between 1 (thin triangle) and 1/2 (symmetrical triangle). 

20 Empirically Understanding Aspect Ratio 

Figures 13 and 14 are graphs that show histograms of aspect ratio in model space (dashed line) and 
screen space (solid line) for T Rex 258 and engine 254. The T Rex model space curve has a pronounced peak 
near 1/2, indicative of a preponderance of pairs of right triangles from near-square quadrilaterals. The model 
space curves for Buddha 256 and textured triceratops 252 (not shown) are similar, with even more 

25 pronounced peaks at 1/2. En gine 254, by contrast, has a much more equal distribution that is mostly in the 

range of 0 to 1/2. Engine 254 is also constructed mostly of right triangles, but because of the tessellator, many 
of these are from more elongated quadrilaterals. Ospxey 250's distribution (not shown) is similar to that of 
engine 254. The projection into screen space tends to smear the aspect ratio probability curves to the left, i.e., 
towards thinner triangles. This is because most projection angles will make a given triangle thinner in one 

30 direction, while only a few angles will make thin triangles fatter. This provides a theoretical basis for (as well 
as an empirical validation of) the observation that screen space triangles tend to be thin. This trend is useful 
for architectures that have large performance penalties for small thin triangles. Histograms of skew (not 
shown) in both model space and screen space tended to be similar and quite flat Skew typically has little 
impact on most hardware's performance, and is only mentioned here for completeness. 

35 

Model Space to Screen Space Transformation — Figures 15A-B 

Turning now to Figure 15 A, an example of a polygon 1 50 in model space is shown. Model space 
refers to the coordinate system used when generating the three-dimensional objects to be rendered. Polygon 
150 is a triangle defined by three vertices, each have a three-dimensional coordinate in (x.y.z,) format. How 
40 polygon 150 is transformed from model space to screen space is determined by the relative position and 
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orientation of viewpoint 152 (also referred to as a camera). For example, if viewpoint 152 were located at 
(100,50,0) and directed at the model space origin (0 r 0,0), then polygon 150 would either disappear from the 
model space view (illustrated in Figure 5B) or appear as a thin line. This is because polygon 150 has no 
depth, and viewpoint 152 would be viewing polygon 1 50 directly from the edge (similar to looking at a piece 
of paper edge-on). 

Figure 15B illustrates the appearance of polygon 150 in screen space (Le., how polygon 150 would 
appear when viewed on the display of a computer) for the viewpoint position and orientation illustrated in 
Figure 5A. Screen space refers to the coordinate system of the pixels on the display device. Model space and 
screen space coordinates are related as a function of the viewpoint used for rendering. Polygon 150 may be 
rotated and scaled (Turing the transformation process. This scaling process allows objects positioned closer to 
viewpoint 152 to appear larger than objects farmer away from viewpoint 152. This scaling process may be 
quantified as a model space to screen space scaling factor "s". 

Polygon statistics may be calculated in both model space and screen space. Taking statistics in 
model space involves processing geometry data at the user level. This only has to be performed once per 
object Taking statistics in screen space, by contrast, is more difficult. Either an entire rendering package is 
written, or an existing package is instrumented (assuming one has access to the source code). Another 
problem is that the results are view dependent While difficult to calculate, these statistics are useful to 
understand the behavior of rendered polygons in screen space. 

However, it is also possible to model the average screen space statistics of a given geometry by a 
transformation of its model space statistics. By averaging screen space statistics over multiple different 
viewing angles, in the limit the results should look like the convolution of the model space statistics with the 
statistics of projecting a single polygon at all possible angles. Furthermore, for high polygon count objects, the 
screen space statistics tend not to vary much with orientation, because geometric detail tends to exist at all 
orientations. 

For example, consider a single polygon of area A in model space. It has a model space probability 
distribution of a Dirac-delta function; pm(a) = S(A-a). Assuming that the scale of the polygon is small 
relative to the distance of the polygon from the viewpoint, the effects of the viewing projection can be 
factored into the following two pieces: ( 1) the model space to screen space scaling factor "s" (resulting in a 
maximum screen space area of sA); and (2) a rotation in the profile of the polygon. Thus when this polygon 
is projected over multiple different view orientations (but all from the same distance), these functions are first 
scaled by s and then smeared into screen space. Thus, the screen space statistics are: 

0< a<sA 

(3) 

otherwise 

This means that the conditional distribution of the projected polygons is uniform. A proof of this is outlined 
further below (see section entitled Proof of Constancy of Projection). 
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— 0£a£sA 
sA 



otherwise 



(4) 



g(a) = 



(sA) 2 
0 



O^a^sA 
otherwise 



(5) 



In the general case, given a model space probability distribution of pm(a), and a model space to 
screen space scale factor s, the aggregate screen space statistics (i.e., for a unity scaling factor s=l) are given 
by the following equations: 



10 f (a) = |pm(x)dx + £ — • pm(x)dx, wherein f (a) = f (as) 



(6) 



g(a) 2 ■ ^5- - pm(y)dxdy, wherein g(a) « g(as) 



(7) 



15 It is implicit in these functions that changes in s will result only in a shifting of a constant curve with 

respect to a. This conjecture was empirically confirmed by comparing screen space plots of £(a) and g(a) with 
plots obtained by numerically integrating prnQ traces for the same object 

Turning to Figure 16, a graphic showing an empirical plot of f(a) for random view orientations of T 
Rex 258 is shown. In the figure, each thin black line represents an empirical plot of f[a) for a single random 

20 view orientation, and the thick black line is fi» derived from numerically integrating model statistics of fm(a). 
The prediction was for both the shape of the curve and the position of the curve. The position was generated 
using a value of s computed from the dynamic modeling and viewing transforms. There is some variation due 
to angle of view, but the overall prediction fits quite welL A similar approach can be used to predict the 
effects of projection on aspect ratio. 

25 

Modeling Rendering Performance 

A simplistic model of rendering performance for a machine is just its pixel fill rate, measured in units 
of pixels per second. In this model, the per-frarne rendering time of a given geometry would be just its total 
rendered screen space area divided by the pixel fill rate. If the scenes being rendered consist mainly of large 
30 polygons, this is a fairly accurate model. However, as previously discussed large polygons are rarely the 



17 



WO 99/41704 



PCT/US99/03227- 



A more realistic mode) of rendering performance takes into account that for any given r e n deri ng 
hardware and setting of rendering attributes, there exists a critical screen area of polygon below which the 
rendering hardware will not be fill limited, but Twmimum polygon processing overhead time limited. Changes 
in per-vertex rendering attributes, such as increases in the number of or complexity of light sources, tend to 
increase the minimum polygon processing overhead time. Changes in per-pixel attributes, such as enabling 
transparency, texturing, or anti-aliasing, or increases in the complexity of texturing, tend to decrease the per- 
pixel fill rate. 

As previously noted, Figure 2 is a plot of polygon render time vs. polygon area based on empirical 
data obtained by timing of real hardware using glperf(is a performance timing application program for 
OpenGL programs). The horizontal line (fitted to the data) represents the mrnjmnm polygon processing 
overhead time limit. The sloped line represents an asymptotic fill rate limit of 224 million pixels per second. 
For the particular machine used to generate the plot, and a particular set of rendering attributes, a*., is about 38 
pixels, and fits the predicted rendering performance model rather welL 

To characterize the rendering performance of hardware by a single number, the concept of "false 
area** may be used. False area converts the effects of minimum polygon overhead into an equivalent area. 
The idea is that any polygon having an area less than a hardware-specific critical area a^ is said to have the 
false area a*. Polygons larger than a* are said to have only "real area". Hie "effective area'* of a polygon is 
defined to be either its false area or its real area, depending on which side of a*, its area lies. The term 
"ordinary area" denotes the standard meaning of area. These terms may be extended to apply to an entire 
geometry by adding up the individual areas of the constituent polygons (or polygons). Thus, the per-frame 
rendering time of a given geometry is the effective area of the geometry divided by the pixel fill rate of the 
graphics system All these areas may be characterized for a geometry in an architecture- independent manner 
by parameterizing them by a^ Formally, for a given geometry consisting of n polygons (all front facing), with 
screen space polygon probability p(), for a* = a, these terms are defined in the following equations: 

ordinary^ area(a c ) = Jx • p(x)dx = total_area (8) 
felse_area(a c ) = a c n- £'p(x)dx = a c nf(a c ) (9) 

real_area(a c ) » £xp(x)dx = total_area(l-g(aJ) (10) 

effective^ areata c ) = false_area(a 0 ) + real_area(a c ) 
■ a c • n • £ e p(x)dx + £x • p(x)dx (11) 

The rendering performance for a geometry can be characterized in an architecture-independent 
manner by de fining the function e(a) to be the ratio of the effective area of a geometry to its ordinary area for 
a value of a* - a. The function e(a) can be defined in terms of the previously defined functions as follows: 
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effective^ area(a») 
ordinary^ arca(a*) 



(f{ac) + l-g(a s )) (12) 



aver age_ area 

The function e(a) is the amount slower than fill-limited that the geometry will render (for a value of 

5 &c = a). 

This is not necessarily a complete model, however. Modem complex geometries are typically closed 
shells, with the back-faces not typically intended to be seen. Thus, back-face culling is typically performed 
(this was the case for our five example geometries). For a more accurate estimation of performance, these 
1 0 back-facing polygons may also be taken into account. On many machines, back-facing polygons cannot be 
dispensed with in less the minimum front-facing polygon processing time. Thus, back-facing polygon 
can be said to have a false area of a*. To extend the formulas disclosed above to include backing facing 
polygons, it is assumed that on average half of the polygons will be back facing. This results in an additional 
"n" polygons with false area a^ 



Thus, more complete definitions of false area and e(a) as set form below: 



^ - avm „r ^ • flfr) + *> ~ *»> (13) 

avcxagc_ area 



20 e(a.) = • (f(ao) + 1) - g(a.) (14) 

average_area 

In these equations, n is still the number of front facing polygons, not the total count of polygons in 
the geometry. On some machines, back- facing polygons can be dispensed with less overhead than a minimal 
rendered polygon. In these cases back- facing polygons may have a false area of less than a*. In this case the 
25 a • n term added to the false area may be replaced with the actual ratio of back face processing time to 
minimal polygon rendering tune, times n. 

Turning now back to Figures 7-1 1 , e(a) as defined above is plotted as the short dashed black line. 
Note that unlike all the other functions plotted, e(a) has a vertical axis not of 0-1, but a magnified range of 
0-10X (scale shown on the right hand side of the graph). A related way of looking at the false area effect is to 
30 plot the ratio of false area to pure true area for a,. « a. This is h(a), and is shown by the medium dashed line in 
the figures. The function h(a) varies from 0 to 1 , and can be directly read as the percentage of the time for 
which a given geometry will spend transform bound vs. fill bound in a graphics system with a« - a. 

. , . false area(a») 

h(aJ = =r 

effective area 
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ac-(f(a«)-hl) 



a. • (f(a«) -h 1 ) + average _ ar ea • ( 1 - g(ac) 



(15) 



Note the curves for e(a) and h(a) shift to the right or left if the model space to screen space scales for 
the geometry being rendered increases or decreases. However, for a given machine and a given set of 
rendering parameters, a<, is typically a constant, and thus the rendering efficiency of a machine for a geometry 
changes when the scale of the geometry changes. 

Empirically Understanding gfl and hQ 

In Figure 7, even though 90% of its area is in triangles greater than ten pixels in size, Osprey 250 the 
graph of e(10) shows that a machine with an a critical area a« of 1 0 would render Osprey 250 at less than half 
the speed of the fill limited rate. This is because 75% of the triangles are less then ten pixels in area. The ma- 
chine of Figure 2, with a critical area of 38 pixels, would be more than six times slower than fill limited (the 
empirical number on this object was 6.8X slower, the prediction is 6.4 X). Even a machine with an of one 
pixel would be nearly a factor of two slower than fill rate limited for T Rex 258, Buddha 256, and triceratops 
252. The reason why can be seen from g(l) for these dim objects, Le., 50% to 70% of the render time is 
locked up in false area. 

Applying eQ and hQ to Hardware Design 

These functions can be applied to graphics hardware design. Given a collection of target geometries 
to be rendered, one can directly trade off the difference between incremental impro vements in polygon rate vs. 
fill rate. A fill rate improvement of a factor of two may reduce the rendering time for real area by a factor of 
two, but may also increase a* by a factor of two. While the overall effect may be to reduce total rendering 
time, if the geometry was already 90% false area limited (seen by examining h(a)), then the factor of two fill 
rats improvement will result in less than a 10% rendering time improvement (some real area will turn into 
false area). Even if the geometry is only 50% false area limited, an infinite improvement in rendering rate 
may only result in a factor of two rendering time improvement Making the base polygon rate twice as fast 
may result in a factor of two reduction in a«. If the geometry was 90% false area limited, then the rendering 
time may improve by no greater than 45% (some of the false area will turn into real area). The m a rgin a l gain 
depends on the slope of the curves near a«.. 

As an example, in the Buddha object h() is 90% at an a* of 2, and r!2) is 4X slower than fill limited. 
Changing a« to 1 reduces h() to 70%, and e() to about 2.3X, making rendering 1-7 times faster. If instead the 
fill rate had been doubled; Sc would have doubled from 2 to 4, and e0 would nearly double from 4X to 7.BX, 
almost completely wiping out the factor of two gain in fill rate. 

General purpose 3D rendering hardware accelerates the rendering of all sorts of objects, and 
improvements in fill rate that have negligible effect on most objects will nevertheless be effective for some 
objects. But overall, for a target market, the statistics of the class of objects to be rendered can be measured, 
and a well-balanced hardware architecture may trade-off polygon rate and fill rate hardware resources to 
rmnimize rendering time for those objects. This may generally be accomplished by keeping cfe) in the small 
integer range. 
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More Complex Models 

Near the critical area hardware can have a somewhat lower fill rate, due to the effects of frame 
buffer memory fragmentation, inefficient vertex chaining, and low aspect ratio polygons. If necessary, given 
specific rendering hardware, more accurate machine- specific values for the functions outlined above at a K a« 
5 can be computed. The more accurate values can take these and other effects into account 

Another limitation of the generic model disclosed above is that it assumes isotropic distributions of 
orientations of polygons in models space. This is not always the case. For example, the large number of flat 
rib bones in T Rex 258 caused the variations seen in Figure 16. The behavior of such geometries can be better 
approximated by the ap p r op ri ate interpolation of a small number of view-specific statistics. Despite these 
1 0 limitations, the e(a) and h(a) functions as given in (14) and (15) provide a good architecture-independent 

method for uiidsrstanding the potential rendering performance of a given geometry. The next section defines 
a rendering time prediction function, and will show how the function may be used to guide runtime load 
balancing. 



15 Application to Rendering Control 

In real-time simulation applications, an impor t a nt feature is predictable, consistent frame rendering 

times. A historic technique to help achieve consistent frame rates is level-of- detail (LOD) objects. Use of 

LOD objects entails storing several alternate geometric representations of an object, sorted by polygon count. 

When the per- frame rendering time approaches or falls below a predetermined minimum threshold, the 
20 current representation of an LOD object can be changed to use one with a lower polygon count. Alternatively, 

view-dependent tessellations can be generated on the fly. These techniques work when the object is polygon 

processing overhead bound, but does not help when the object is £01 rate bound 

In cases where the object is fill rate bound, the object may be deleted altogether. Alternatively, the 

graphics system may be configured to reduce the pixel area for a given object on a per-rrame basis. Another 
25 alternative may be to dynamically reduce the display resolution, and thereby re duc e the number of pixels in 

the entire scene. Selected background objects may be deleted in other embodiments. 

For architectures that utilize super-sampling, the number of samples or the sample density for an 

object or the entire scene may be reduced as the frame rate drops. The function h(a) provides a formal meth od 

to determine the extent to which a given object is fill bound or overhead bound, and moreover how much this 
30 would change for other choices within an LOD object This may advantageously allow for more global and 

accurate decisions to be made for frame rate control. 

Real-time Prediction of Geometry Rendering Time 

The f and g functions can be used to define an accurate, real-time algorithm for predicting how 

3 5 long a given geometry will take to render. As part of off-line processing, pmO can be computed for a 

geometry, and from this f and g can be computed by numerical integration. Using a procedure bice that 
which produced Figure 2, a separate off-line process can calculate several values of a* for important sets of 
rendering attributes (for a given hardware architecture). Then at run-time, me scaling factor s can be 
computed from the modeling and viewing matrices, and the geometry's render time can be estimated by the 

40 following equation: 
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rcnder_tune = effcctrvearea / pixel_fill_rate 

(a c • n • f (a e s) + 1) + ^ • s- total_ mod el_ area • (1 - g(a 6 s)) 

pixels fiU_ rate (16) 

5 

Hie total screen area of die geometry from equation (2) can be estimated by multiplying the 
pre- computed total model space area by one half s. This was used in equation (16). 

Rendering times may also be calculated for a plurality of different viewpoints for one or more object 
variants. The calculated rendering rimes may then be averaged to obtain an overall estimated rendering time 

10 for each object variant The graphics system may then select the most visually realistic object variants 

consistent with the desired minimum frame rate based on the average overall estimated rendering times. In 
another embodiment, the rendering times may be calculated for all possible viewpoints for an object variant 
and then averaged, ot alternatively, only selected representative viewpoints (e.g., overlapping or non- 
overlapping symmetrical portions of the object or predefined views that have the highest likelihood of 

1 5 occurring) may be used. In another embodiment, the convolution function may be used to calculate and 
average the rendering times for all possible viewpoints of the object. 

Frustum Clipping 

In some erwh rt riiwrgntE the method for rendering time prediction may also be configured to take into 
20 account the effects of view frustum culling. View frustum culling refers to discarding polygons that are not 
within the current visible region of the display. On most modem machines, polygons outside the view 
frustum are trivially rejected in processing time similar to back face culling, i.e., usually they will have a false 
area of a©. Polygons that are actually clipped into one or more pieces generally take considerably longer, but 
are correspondingly rare, and their effect can usually be ignored. In some embodiments, an estimate of the 
25 fraction of the geometry that is outside the view frustum is made at run time. Letting this traction be a, an 
updated render time prediction function follows: 



(1 — a) • effective^ area + a • a c 

render time = : — : — — 

~ pixel_fill_rate 



(IS) 



30 Rendering Quality Control 

In some other applications, the control constant may be image quality, rather than frame Tate. 
Because rendered image quality is related to keeping the size of the majority of polygons below the Nyquist 
rate of the combination of the display system, the physical viewer's perception ability, and the image content, 
the curves also provide a formal method of controlling image quality. Specifically, a user may wish to choose 

35 a level of detail object such that for the current s, f(l) is 0.5 or less (e.g., to keep the median area sub-pixel). 

However, in general this threshold is a qualitative judgment choice, and for many geometries, including most 
of the example objects presented here, little perceivable quality is lost even choosing the median area to be as 
high as 2. The reason for this is that visually interesting high spatial frequencies tend to lie in the still large 
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minority of polygons that are sub-pixel in size in such distributions. The fact that some polygons are larger 
than a single pixel does not violate the display's Nyquist rate, such polygons merely represent the lower 
spatial frequency areas that most objects have. This is similar to the statistical argument that successfully lets 
2D image compression techniques not encode high frequency energy at all areas of most images, with few 
visually perceptible artifacts. Note also that most artifacts of Gouraud shading disappear for polygons that are 
only a few pixels in area. Many hi gh quality software rendering packages use simple flat shading once 
polygons approach one pixel in size. 

Extending to Variable Resolution Displays 

Traditionally, lev el-of- detail object selection decisions are made temporally, based on size and 
object-importance information. However, in a variable-resolution sample buffer, the LOD decision may also 
take into account the minimi!™ local pixel size in the area of the screen where a geometry is to be rendered. 
In the method outlined above, this is handled by the appropriate pre-scaling of s to march this local pixel size. 

This empirical result, Le., that most geometries do not increase in image quality once the median 
polygon size approaches one pixel, is useful in understanding the ultimate performance requirements for 
rendering. It is this one-or-fewer-polygons-per-pixel density number, when combined with human visual 
system limits and physical display device limits, that will allows the method to estimate an appropriate 
maximum polygon rendering rate target for a given display device. 

Limits of Human Vision 

The eventual consumer of all 3D rendering is the Human visual system. With display technology and 
real-time hardware rendering speeds ever increasing, graphics systems are on the threshold of surpassing the 
human visual system's input capabilities. On a machine with a single user and a sustained render frame rate of 
60 Hz, even present day CRTs exceed the maximum spatial frequency detection capability of the human 
visual system, in regions away from where the fovea is looking. The fovea is a region of the human retina 
that has the most acute visual perception. 

To take advantage of this situation, hardware rendering architectures may implement some form of 
variable resolution sample buffer. In such a sample buffer, the spatial resolution is not fixed, but is instead 
programmable (e.g., on a per-frame basis) to match the variable-resolution nature of human vision. Such 
pixels can be ant^aiiased, and the anti-aliasing filter's frequency cut-off can also be configured to vary 
dynamically to match the local effective pixel density. 

Highest resolution perceivable pixels; 28 seconds of arc. 

Several physical factors limit the highest spatial frequencies mat can be perceived by the human eye. 
The diffraction limit of the pupil, the foveal cone spacing, and neural trace and physiological tests all confirm 
a maximum perceived frequency of approximately one cycle per are-minute (half arc-minute pixels). This is 
under optimal (but non-vernier) conditions, including 100% contrast While not quite directly comparable, 
so-called "20/20" vision represents detecting image features twice as large. 

Vernier conditions are a common example of hyperacuity, e.g., when one can detect a shift as small 
as three seconds of arc in the angular position of a large visual object. Here the visual system is 
reconstructing higher spatial frequency information from a large number of lower frequency samples. 
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However, the visual system can do the same for lower frequency rendered 3D graphics images so long as the 
higher spatial frequencies were present during the anti-aliasing process. 

Variable resolution: 1/2@±1 0 , U4<^M/8®±SM/16@±12° - Figures 17A-B 
. S This high resolution, however, applies only to the central 2° of vision. Outside of this region, the 

cone spacing and measured perceptional acuity drop off even faster than the optical limits. In many 
textbooks, this drop off is plotted as a sharp cusp. However, this representation does not do justice to how 
small the high spatial frequency perception region of the visual field is. Figure 17A plots an alternate 
visualization of this data onto the surface of a unit sphere: which portions of the 4k steradian field of view are 

1 0 perceived at what resolution. There are 5 false color bands, each corresponding to a factor of two less 
perceptorial resolution. Figure 4b is an enlargement of the central region of the sphere. The center most 
region corresponds to the central ±1° of the fovea. The second region from there to ±2°, the third region to 
±5°, the fourth region to ±12°, and the fifth region to the optical edge caused by the human face. The white 
represents the non-visible regions. This optical edge has a complex shape, and varies both in the individual 

1 5 and the literature. For these calculations, data from the article "Visual Processing and Partial-Overlap Head 

Mounted Displays," by Scott Grigsby and B. Tsou, from the Journal of the Society for Information Display, 2, 
2 (1994), 69-74 was used. The data has maximum field of views that vary horizontally from -59° to +1 10°, 
and vertically from -70° to +56°. To show both sides of this more than 180° field, two unit spheres are 
shown, one for a right eye and one for a symmetrically-reversed left eye. Thus, if the direction of gaze is 

20 known, across the entire visual field, the human visual system can perceive approximately only one fifteenth 
die visual detail mat would be discernible if foveal resolutions were available for the entire field. 

To understand die possible impact this may have on 3D graphics systems, Figure 18 is a table 
presenting a comparison of estimated visual and display parameters for several representative display devices. 
In the table, the column 400 represents various display devices. The rectangular displays are characterized by 

25 their diagonal measurement and typical user viewing distance. The bottom two entries are the pure limits of 
the visual system, and a nontracked visual system (Full Sphere). Column 402 represents the displays* pixel 
resolution. The movie resolution is an empirical number for 35-mm production film. These numbers also 
determine the aspect ratio of the device. Column 404 represents the displays' pixel size. This is the angular 
size of a single display pixel in minutes of arc. Column 406 represents the displays' total solid angle visual 

3 0 field of view (FOV) in units of steradians. Column 408 represents the maximum human-perceivable pixels 
within the field of view, assuming uniform 28 second of arc perception. This is simply the number of pixels 
of the size in column 404 that fit within the steradians of commn 406. Column 410 represents me same 
information as the previous column, but for more practical 1 .5 arc- minute perception pixels. Column 4 12 
represents the maximum human-perceivable pixels within the field of view, assuming the variable resolution 

3 5 perception of Figures 1 7 A-B. Column 414 represents the pixel limit of the display itself (multiplication of the 
numbers from column 402). Column 416 represents the number of perceivable pixels taking both the display 
and eye limits into account This was computed by checking for each area within the display FOV that was the 
limit- the eye or the display, and counting only the lesser. Column 418 represents the limits of the previous 
column as maximum polygon rendering rates (in units of billions of polygons per second), using additional 

40 models developed further below. 
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To compute many of the numbers in Figure 1 8, the unit sphere was broken up into 216 small 
sections, each with then* own local maximum perceptible spatial frequency. Numerical integration was then 
performed an the intersection of these sections and the display FOV edges (or, in the case of me mil eye, the 
edge of the visual field). The angular size of uniform pixels on a physically flat display is not a constant, i.e., 
5 the pixels will become smaller away from the axis. The effect is minor for most displays, but becomes quite 
significant for very large field of view displays. However, for simplicity mis effect was not taken into account 
in the numbers in the table* as real display systems address this problem with multiple displays and/or optics. 

There are several items of note in this table. The FOV of a single human eye is about one third of the 
entire 4n steradians FOV. A wide-screen movie is only a twentieth of the eye's FOV, and normal television is 
1 0 less than a hundredth. A hypothetical spherical display about a non-tracked rotating (in place) observer would 
need over two thirds of a billion pixels to be Tendered and displayed (per eye) every frame to guarantee full 
visual system fidelity. An eye-tracked display would only require one forty-fifth as many rendered pixels, as 
the perception limit on the human eye is only about 15 million variable resolution pixels. 

15 The Limits of Rendering 

The following simple model provides one estimate of the maximum rendering rate that may be 
needed for a real-time system: 

A /second - frame rate * (number of eyes) • screen pixels ■ depth complexity * A /pixel. (IB) 

20 

An empirical estimate of this last term is approximately one. The previous section developed estimates of 
screen pixels based on displays and perception. Frame rate has not been extensively discussed, other than an 
assumption that it is at or above 60 Hz. Very little is known about the interaction of rapidly-varying complex 
rendered images with the human visual system. Currently, a good approach is to pick a value that is estimated 

25 to be high enough. Some have even speculated mat very high rendering frame rates (in excess of 300 Hz) 

may interact more naturally with the human visual system to produce motion blur effects than the traditional 
computer graphics techniques. 

The pixels referenced in the table are assumed to be anti-aliased with a high quality resampling filter, 
either based on super-samples or area coverage techniques. The pixel counts in the table may be multiplied 

30 by the super-sampling density to obtain counts of samples rather than pixels. The polygon statistics touched 
only peripherally on depth complexity. However, assuming reasonable occlusion culling techniques, current 
experience is that depth complexity in many embodiments can be kept in a range of 3 to 6 in most (but by no 
mean all) cases. For purposes of example, a depth complexity of 6 is assumed. These assumptions were used 
to compute column 418 of the table using equation (18) (using two eyes for stereo displays). The numbers are 

35 in units of billions of polygons per second. Under these assumptions, the maximum polygon rendering rate 
required to saturate the human visual system is; 

60 Hz • 2 eyes • 14.78M pixels ■ 6 DC * lA/pixel = 10.64B A/sec. (19) 

40 This is just over ten billion polygons per second. For most traditional display devices, the saturation 

number is under half a billion polygons per second. The numbers presented here are neither theoretical 
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minimum nor maximum calculations; they are conservative "average case" estimates, and changes in any of 
the assumptions can have a large effect on the results. 

Log plots of cumulative statistics of screen space polygons may give useful insight into 
understanding the empirical behavior of polygons. This information can be used in TnaV-tng trade-offs in me 
5 design of real-time rendering hardware, and in the design of triangulated objects to be rendered in real-time. 
It was shown how these screen space statistics could be directly computed from their model space versions. 
These same functions can be used to define rendering performance functions in an architecture-independent 
manner, using the concept of false area. Refined versions of these performance functions can be used in 
managing the frame rate or controlling the quality of real-time rendering. Near-optimal visual quality requires 
10 the median polygon to stay near a single pixel in size. 

Turning now to Figure 19A» one embodiment of a method to efficiently calculate rendering speeds 
for sets of three-dimensional graphics data is shown, in mis embodiment, a number of values from the 
equations outlined above are ^recalculated- Note while the flowchart illustrates these calculations being 
performed in a serial manner, they may also be performed in parallel or in a different order than that depicted 

15 in the figure. First, values for a c> pm(x) > f (a^s), g (OoS), s, and total model space are calculated (steps 300- 

308). Next, if view frustum culling is to be considered (step 300), then a is calculated in real-time or near real 
time (step 3 14). Then, the rendering time is calculated as the effective area divided by the pixel fill rate (step 
3 12). As with optional step 3 14, step 312 may be performed in real-time or near real-time to allow frame-by- 
frame calculation of rendering times. 

20 Next, the calculated rendering time is compared with the desired frame rate (step 320). If the 

calculated rendering time is fast enough to meet the predetermined minimum frame rate, men the graphics 
system may render the frame with the current parameters (step 322). If the calculated render time is too slow 
to meet the desired frame rate, equation (1 5) may be utilized to determine if the frame will be fill rate bound 
or polygon overhead bound (step 326). If the frame is polygon bound, then the graphics system may modify 

25 the rendering parameters to reduce the number of polygons (step 330). As previously described, this may be 
accomplished in a number of ways (e.g., by selecting a LOD with fewer polygons, or by d ynamic a l ly 
tesseUatmg the object into fewer polygons). If the frame is pixel fill bound, then the graphics system may be 
configured to modify the rendering parameters to reduce the number of pixels or samples (step 332). As 
previously described, mis may also be accomplished in a number of different ways, i nc lu d ing c hanging the 

30 number of samples calculated per pixel (in a super-sampled system) or by dynamically ch a nging the size of 
the object or frame being rendered. Another alternative may be to discard certain background objects (e.g., 
those designated as less importan t by tho software application that generated the frame). 

This method may be implemented in hardware or software, or a combination thereof. The 
calculations may be performed on a per-frame basis (i.e., real-time), or on a less-frequent basis (near-real 

35 time), or offline. Some embodiments may perform all calculations in real-time, or, as described above, a 

number of values may be pre- calculated for a particular graphics data set with only the final calculations being 
performed in real-time. StUl other embodiments may perform all calculations offline. 

Turning now to Figure 1 9B, one embodiment of a set of graphics data is shown. As the figure 
illustrates, in this example graphics data set 450 comprises a plurality of general objects S60A-C. Each 

40 general object in turn comprises a plurality of object variants 570. These object variants may themselves 
comprise a plurality of polygons and corresponding rendering attribute information (e.g., textures). The 
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object variants may correspond to differing levels of detail (LOD), and may be selected before or during the 
rendering process to achieve a predetermined m^im,*™ frame rate (e.g., in step 332 of Figure 19A). For 
example, object variant 576A may correspond to the sphere in Figure IB, while object variant 51GB may 
correspond the sphere in Figure 1A, Furthermore, object variant S76A may contain rendering attributes such 
5 as a marble texture that is to be texture mapped onto the sphere, while object variant 576B may comprise a 
rendering attribute of simple shading with no texture. Some object variants may share the same polygons and 
may vary solely by rendering attributes. Similarly, other object variants may share the same rendering 
attributes and may vary by polygon count. Some general objects may have only one object variant, while 
others may have a large number of variants. 

10 During rendering, the graphics system may be configured to calculate an estimated rendering time 

for each object variant of all general objects to be rendered and men select the most visually-realistic object 
variants consistent with a predetermined minimum frame rate. While the object variant selected for a 
particular general object may vary from frame to frame, the graphics system may consider a number of factors 
in selecting which object variant to render for a particular general object For example, the graphics system 

15 may consider the general object's position. General objects that are in me background may be given a lower 
priority or importance and thus may have a faster-rendering object variant selected. In a flight simulator, for 
example, a general object corresponding to a tree in a forest in the background may be given less priority 
(making it more likely that the graphics system will select a faster-rendering, less visually-realistic object 
variant) than another general object corresponding to an enemy aircraft in the immediate foreground. The 

20 graphics system may also be configured to use hysteresis when selecting object variants. For example, 

assume object variant 576A had been selected for general object 560C for the previous 200 frames, and then 
performance limits forced the graphics system to select object variant 576B to maintain a predetermined 
minimum frame rate. After rendering object variant 576B for one fr ame, the graphics system may be 
configured to continue to select object variant 576B for the next few frames, even if the system's performance 

25 limits would allow it to select the more visually realistic object variant 576A. This hysteresis may 

advantageously prevent unwanted flickering that may occur if different object variants for a particular general 
object are selected in rapid succession. 

Proof of Constancy of Projection 

30 Assuming that a unit-area world-space polygon is viewed orthographic a try from a random viewpoint, 

then for the purposes of c om pu t ing the projected area, only the angle between the polygon's facet normal and 
the view direction matters. The screen space area of the projected polygon for a given view will be just the 
cosine of the angle (the inner product of the normalized vectors). Thus, an equal probability distribution of all 
possible views is just a uniform distribution of directions. This can be r e p resented as uniformly distributed 

3 5 points on a unit sphere. Without loss of generality, it suffices to only consider a hemisphere of points; as half 
the view directions will be back facing. 

Points with the same inner product, and corresponding to views that will produce the same area, will 
all lie on the same latitude on the hemisphere. The "thickness" of lower latitudes exactly offsets the change in 
circumference, resulting in equal probabilities of view angles (thus areas). 
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Figure 2D shows a cross section of this unit hemisphere. The measure of a band of constant height da 
at latitude 6 is 2*rc • sin 6 • d0. However, dO = da/(sinG), so the sines cancel and the number of points in any 
band is independent of 0. Mote however, -mat this constancy result may only hold in three dimensions. 



5 Example Computer Network. - Figure 21 

Figure 21 illustrates an example computer network 500 comprising at least one server computer 502 
and one or more client computers 506 A-N (in the embodiment shown in Figure 21, client computers 506A-B 
are depicted). One or more of the client systems may be configured similarly to computer system 80, each 
having one or more graphics systems 1 12 as described above. Each may further be configured to perform 

1 0 rendering time estimates as described above. Server 502 and clients) 506 may be joined through a variety of 
connections 504, such as a local-area network (LAN), a wide-area network (WAN), or an Internet connection. 
In one embodiment, server 502 may store and transmit 3-D geometry data (which may be compressed) to one 
or more of clients 506. The clients 506 receive the 3-D geometry data, decompress it (if necessary), estimate 
the rendering time, and then render the geometry data (with modified rendering parameters as necessary). 

1 5 Note as used herein, rendering parameters comprise: the number of pixels or samples in the 

object/scene/image being rendered, the number of samples per pixel, the color depth, the texture parameters, 
the number of lights (and their corresponding properties), special rendering effects (e.g., transparency, anti- 
aliasing, fogging, blur effects), and the number of objects rendered. The rendered image is then displayed on 
the client's display device. The clients may render the geometry data and display the image using standard or 

20 super-sampled sample buffers as described above. In another embodiment, the compressed 3-D geometry 
data may be transferred between client computers 506. 

Although the embodiments above have been described in considerable detail, other versions are 
possible. Numerous variations and modifications will become apparent to those skilled in the art once the 
above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all 

25 such variations and modifications. Note the headings used herein are for organizational purposes only and are 
not meant to limit the description provided herein or the claims attached hereto. 

Industrial Exploitation 

The previously described method and computer software program may be exploited in computer 
30 hardware, graphics devices, personal digital assistants (PDAs), appliances with graphics displays, graphics 

processors, graphics sub-systems, virtual reality systems, head-mounted displays, computer games, and other 
electronic devices that generate, display, transmit or manipulate 3D graphics data. The method and computer 
software program may be implemented in hardware, software, or any combination thereof. 
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What is claimed is: 

1. A computer software program tor estimating rendering times in a graphics system embodied on a 
carrier medium, wherein said software program is configured to estimate performance for a graphics system 
for polygon rendering, wherein said software program comprises a plurality of instructions configured to: 

calculate a rendering time for a set of graphics data; and, if the rendering time exceeds that specified by a 
predctei m m ed o nnrrnurn frame rate, then 

determine whether the graphics data is polygon overhead limited or pixel fill limited, and 
change rendering parameters accordingly to achieve said predetermined minimum frame rate. 

2. The computer software program as recited in claim 1, wherein said set of graphics data comprises a 
plurality of general objects, wherein each general object comprises a plurality of object variants, wherein each 
object variant comprises a plurality of polygons and rendering attributes, and wherein said plurality of 
instructions are further configured to change the rendering parameters by selecting a different object variant 
for one or more of the general objects. 

3 . The computer software program as recited in claim 2, wherein said plurality of instructions are 
further configured to calculate a rendering time for each object variant of each general object and select the 
most visually realistic object variant for each general object to be rendered consistent with said predetermined 
pniTiT t i m in o friw^w rate* 

4. The computer software program as recited in claims 2 or 3, wherein said plurality of instructions are 
further configured to select the most visually realistic object variant for each general object to be rendered 
based upon one or more considerations selected from the group comprising: consistency with said 
predetermined miTifnmrm frame rate, hysteresis, each object's relative imp o rta nc e, and each object's position. 

5. The computer software program as recited in claim 3, wherein a subset of general objects are given 
higher priority than other general objects. 

6. The computer software program as recited in claims 2, 3 or 5, wherein said plurality of instructions 
are further configured to calculate a cumulative probability density distribution f(a) for at least one of the 
object variants, wherein fi» is the probability of a randomly chosen polygon within the object variant having, 
an area a or less. 

7. The c omp uter software program as recited in claim 6, wherein said plurality of instructions are 
further configured to calculate a cumulative area g(o) for all polygons in at least one of the object variants, 
wherein g(a) is the ratio of the amount of total surface area accounted for by polygons within the object 
variant having an area a or less over the total surface area of the object variant. 
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8. The computer software program as recited in claim 6, wherein said plurality of instructions are 
further configured to average f(a) over multiple different viewing angles. 

9. The computer software program as recited in claim 6 wherein said plurality of instructions are further 
configured to: 

divide the possible viewpoints for one or more of the general objects into multiple different sets of viewing 
angles, 

calculate average values for f{a) for each different set of viewing angles, and 

select one value of the average values for fi» based on the current viewpoint for use in calculating the 
rendering tunc. 

10. The computer software program as recited in claims 7» 8 or 9, wherein said plurality of instructions 
are further configured to average g(a) over multiple different viewing angles. 

1 1 . The computer software program as recited in claims 7, wherein said plurality of instructions are 
further configured to calculate an aspect ratio for each polygon, wherein said aspect ratio for each polygon is 
the ratio of the polygon's height divided by the polygon's width 

12. The computer software program as recited in claim 1 1 , wherein said plurality of instructions are 
further configured to calculate a skew for each polygon, wherein said skew for each polygon is the polygon's 
comer width divided by the polygon's width. 

13. The computer software program as recited in claim 12, wherein g(a), the aspect ratio, and the 
skew are each calculated twice, once off-line in model space and once in real-time for screen space. 

14. The computer software program as recited in claims 1, 2, 12 or 13, wherein said plurality of 
instructions are further configured to calculate rendering times for a plurality of different viewpoints for each 
object variant and then average the calculated rendering times for said plurality of different viewpoints for 
each object variant. 

15. The c ompu ter software program as recited in claim 1 4, wherein said plurality of instructions are 
configured to calculate rendering times for all possible viewpoints for each object variant and then average the 
calculated rendering tunes. 

16. The computer software program as recited in claim 15, wherein said plurality of instructions are 
configured to calculate rendering times for all possible viewpoints for the object variant by calculating 
rendering times for representative viewpoints for symmetrical portions of the object variant and averaging the 
calculated rendering times. 



30 



# • 

WO 99/41704 PCT/US99/G3227_ 

17. The computer software program as recited in claims 1 5 or 16, wherein said plurality of instructions 
are further configured to calculate and average the rendering times for all possible viewpoints for the object 
variant by performing a convolution calculation. 

1 8. The computer software program as recited in claims 1, 2, or 16, wherein said rendering parameters 
are selected from the group consisting of: sample density, samples per pixel, number of pixels, lighting 
effects, number of light sources, level of detail, number of polygons, anti-aliasing, fogging, texture mapping 
parameters, programmable shaders, shading parameters, and color depth, 

19. The computer software program as recited in claims 1 or 2, wherein said carrier medium is a 
computer-readable media or a transmission media 

20. A method for estimating a graphics system's rendering performance for a particular set of geometry 
data, said method comprising; 

determining the graphics system's pixel fill rate; and 

calculating the graphics system's per-frame rendering time for the geometry data, wherein the geometry data 
comp rises a plurality of polygons, wherein said per-frame rendering time is the effective area of the geometry 
data divided by the pixel fill rate, wherein the effective area equals the sum of me areas of ail front-facing 
polygons in the geometry, wherein the area of each front-facing polygon in the geometry having an area less 
than a predetermined area a* is rounded up to the predetermined area and wherein said predetermined area 
is a constant describing the performance of the graphics system. 

21. The method as recited in claim 20, wherein said per-frame rendering time further includes an 
adjustment for back facing polygons, wherein said adjustment is an approximation of the number of back 
facing polygons multiplied by the predetermined area a« and the ratio of back-face processing times to 
minimal polygon rendering tiines. 

22. The method as recited in claims 20 or 21, wherein said set of graphics data comprises a plurality of 
general objects, wherein each of said general objects comprises a plurality of object variants, wherein said 
method further comprises selecting one of said object variants having a lower polygon count for rendering if 
said per-frame rendering rate falls below a predetermined minimum value and said graphics system 
performance is polygon overhead bound. 

23. The method as recited in claim 2 1, further comprising causing said graphics system to reduce the 
pixel area of the set of graphics data if said per-frame iradering time results in a frame rate that falls below a 
predetermined minimum value and said graphics system performance is fill rate bound 

24. The method as recited in claim 2 1 , wherein said graphics system is capable of varying sample 
densities, and wherein said method further comprising reducing the graphics system's sample density used to 
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render at least part of the graphics data if said per-frame rendering time results in a frame rate that falls below 
a predetermined minimum value and said graphics system performance is fill rate bound 

25. The method as recited in claims 21 or 24, further comprising reducing the pixel area of the graphics 
data by reducing the complexity of texture calculations performed on at least part of the graphics data if said 
per-frame rendering rate falls below a predetermined minimum value and said graphics system performance is 
fill rate bound. 

26. The method as recited in claim 21, further comprising selecting a set graphics data with a higher 
polygon count for rendering if said per-frarne rendering rate rises above a predetermined maximum value and 
said graphics system p erform ance is not polygon overhead bound. 

27. The method as recited in claims 21 or 26, further comprising causing said graphics system to 
increase the pixel area of the graphics data if said per-frame rendering rate rises above a predetermined 
maximum value and said graphics system performance is not fill rate bound. 

28. The method as recited in claims 21 or 26, further comprising increasing the pixel area of the graphics 
data by video resizing if said per-frame rendering rate rises above a predetermined maximum vahie and said 
graphics system performance is not fill rate bound. 

29. The method as recited in claims 21 or 26, further comprising reducing the pixel area of the graphics 
data by deleting one or more background objects if said per-frame rendering rate rises above a predetermined 
maximum value and said graphics system performance is not fill rate bound 

30. The method as recited in claims 21 or 26, further comprising reducing the sample density used to 
render at least part of the graphics data if said per-frame rendering rate rises above a predetermined maximum 
value and said graphics system performance is not fill rate bound. 

3 1 . The method as recited in claims 2 1 or 26, further comprising reducing the pixel area of the graphics 
data by reducing the complexity of texture calculations performed on at least part of the graphics data if said 
per-frame rendering rate rises above a predetermined maximum value and said graphics system performance 
is not fill rate bound 

32. A method for predicting the approximate rendering time for a graphics system to render a particular 
set of geometry data, the method comprising: 

deteimining a pixel fill rate for the graphics system; 

calculating en effective area for the particular set of graphics data to be rendered; and 

estimating the geometry's render time in real time by dividing the effective area by the pixel fill rate. 
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33. The method as recited in claim 32, wherein said calculating the effective area comprises: 

calc ulating a zeal area for the set of geometry data, wherein said real area corresponds to the surface area of 
polygons in said geometry data mat are equal to or above a predetermined constant a«; 

calculating a false area for the set of geometry data, wherein said false area approximates the effects of 
minimum polygon overhead and corresponds to the predetermined constant a*, multiplied by the number of 
polygons in said geometry data that are smaller than the predetermined constant a*; and 

summing said real area and said false area. 

34. The method as recited in claim 33, wherein said calculating the real area comprises: 

evaluating a function g(&c*) thai approximates the total surface area accounted for by polygons having areas 
less than or equal to the product of and s, wherein Bp is a predetermined constant, and wherein s is a model 
space to screen space sealing factor that is assumed to be one for the calculation of g; 

pre -calculating a total screen area for the set of geometry data; 

calculating a model space to screen space scaling motor s; and 

multiplying a aid total area with (1- g (a^a)) performing numerical integration. 

35 . The method as recited in claims 32, 33 or 34, wherein said estimating and said calculating are 
performed in real time, and wherein said determining is performed off-line. 
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